Search CORE

278 research outputs found

Effect of forename string on author name disambiguation

Author: Kim Jenna
Kim Jinseok
Publication venue: 'Wiley'
Publication date: 01/07/2020
Field of study

In author name disambiguation, author forenames are used to decide which name instances are disambiguated together and how much they are likely to refer to the same author. Despite such a crucial role of forenames, their effect on the performance of heuristic (string matching) and algorithmic disambiguation is not well understood. This study assesses the contributions of forenames in author name disambiguation using multiple labeled data sets under varying ratios and lengths of full forenames, reflecting real‐world scenarios in which an author is represented by forename variants (synonym) and some authors share the same forenames (homonym). The results show that increasing the ratios of full forenames substantially improves both heuristic and machine‐learning‐based disambiguation. Performance gains by algorithmic disambiguation are pronounced when many forenames are initialized or homonyms are prevalent. As the ratios of full forenames increase, however, they become marginal compared to those by string matching. Using a small portion of forename strings does not reduce much the performances of both heuristic and algorithmic disambiguation methods compared to using full‐length strings. These findings provide practical suggestions, such as restoring initialized forenames into a full‐string format via record linkage for improved disambiguation performances.Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/155924/1/asi24298.pdfhttps://deepblue.lib.umich.edu/bitstream/2027.42/155924/2/asi24298_am.pd

arXiv.org e-Print Archive

Deep Blue Documents at the University of Michigan

Scale‐free collaboration networks: An author name disambiguation perspective

Author: Kim Jinseok
Publication venue: 'Wiley'
Publication date: 07/11/2018
Field of study

Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/149559/1/asi24158.pdfhttps://deepblue.lib.umich.edu/bitstream/2027.42/149559/2/asi24158_am.pd

arXiv.org e-Print Archive

Crossref

Deep Blue Documents at the University of Michigan

A Syllable-based Technique for Word Embeddings of Korean Words

Author: Choi Sanghyuk
Kim Taeuk
Lee Sang-goo
Seol Jinseok
Publication venue
Publication date: 01/01/2017
Field of study

Word embedding has become a fundamental component to many NLP tasks such as named entity recognition and machine translation. However, popular models that learn such embeddings are unaware of the morphology of words, so it is not directly applicable to highly agglutinative languages such as Korean. We propose a syllable-based learning model for Korean using a convolutional neural network, in which word representation is composed of trained syllable vectors. Our model successfully produces morphologically meaningful representation of Korean words compared to the original Skip-gram embeddings. The results also show that it is quite robust to the Out-of-Vocabulary problem.Comment: 5 pages, 3 figures, 1 table. Accepted for EMNLP 2017 Workshop - The 1st Workshop on Subword and Character level models in NLP (SCLeM

arXiv.org e-Print Archive

Crossref

Hybrid Deep Learning Architecture to Forecast Maximum Load Duration Using Time-of-Use Pricing Plans

Author: Kim Jinseok
Kim Ki Il
Shah Babar
Publication venue: ZU Scholars
Publication date: 22/03/2021
Field of study

Load forecasting has received crucial research attention to reduce peak load and contribute to the stability of power grid using machine learning or deep learning models. Especially, we need the adequate model to forecast the maximum load duration based on time-of-use, which is the electricity usage fare policy in order to achieve the goals such as peak load reduction in a power grid. However, the existing single machine learning or deep learning forecasting cannot easily avoid overfitting. Moreover, a majority of the ensemble or hybrid models do not achieve optimal results for forecasting the maximum load duration based on time-of-use. To overcome these limitations, we propose a hybrid deep learning architecture to forecast maximum load duration based on time-of-use. Experimental results indicate that this architecture could achieve the highest average of recall and accuracy (83.43%) compared to benchmarkmodels. To verify the effectiveness of the architecture, another experimental result shows that energy storage system (ESS) scheme in accordance with the forecast results of the proposed model (LSTM-MATO) in the architecture could provide peak load cost savings of 17,535,700KRWeach year comparing with original peak load costs without the method. Therefore, the proposed architecture could be utilized for practical applications such as peak load reduction in the grid

ZU Scholars (Zayed University)